NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Identification of Diverse Bacteriophages Associated with Bees and Hoverflies

https://doi.org/10.3390/v17020201

Bandoo, Rohan A; Kraberger, Simona; Ozturk, Cahit; Lund, Michael C; Zhu, Qiyun; Cook, Chelsea; Smith, Brian; Varsani, Arvind (February 2025, Viruses)

Bacteriophages are the most numerous, ubiquitous, and diverse biological entities on the planet. Prior studies have identified bacteriophages associated with pathogenic and commensal microbiota of honeybees. In this study we expand on what is known about bacteriophages from the lineages Caudoviricetes, Inoviridae, and Microviridae, which are associated with honeybees (Apidae, Apis mellifera), solitary bees of the genus Nomia (Halictidae, Nomia), and hoverflies (Syrphidae). The complete genomes of seven caudoviruses, seven inoviruses, and 288 microviruses were assembled from honeybees (n = 286) and hoverflies in Arizona (n = 2). We used bacterial host predictive software and sequence read mapping programs to infer the commensal and transient bacterial hosts of pollinating insects. Lastly, this study explores the phylogenetic relationships of microviruses sampled from bees, opportunistically sampled pollinating insects such as hoverflies, and blackflies.
more » « less
Free, publicly-accessible full text available February 1, 2026
Generation of accurate, expandable phylogenomic trees with uDance

https://doi.org/10.1038/s41587-023-01868-8

Balaban, Metin; Jiang, Yueyu; Zhu, Qiyun; McDonald, Daniel; Knight, Rob; Mirarab, Siavash (May 2024, Nature Biotechnology)

Phylogenetic trees provide a framework for organizing evolutionary histories across the tree of life and aid downstream comparative analyses such as metagenomic identification. Methods that rely on single-marker genes such as 16S rRNA have produced trees of limited accuracy with hundreds of thousands of organisms, whereas methods that use genome-wide data are not scalable to large numbers of genomes. We introduce updating trees using divide-and-conquer (uDance), a method that enables updatable genome-wide inference using a divide-and-conquer strategy that refines different parts of the tree independently and can build off of existing trees, with high accuracy and scalability. With uDance, we infer a species tree of roughly 200,000 genomes using 387 marker genes, totaling 42.5 billion amino acid residues.
more » « less
Full Text Available
BinaRena: a dedicated interactive platform for human-guided exploration and binning of metagenomes

https://doi.org/10.1186/s40168-023-01625-8

Pavia, Michael J.; Chede, Abhinav; Wu, Zijun; Cadillo-Quiroz, Hinsby; Zhu, Qiyun (August 2023, Microbiome)

Abstract BackgroundExploring metagenomic contigs and “binning” them into metagenome-assembled genomes (MAGs) are essential for the delineation of functional and evolutionary guilds within microbial communities. Despite the advances in automated binning algorithms, their capabilities in recovering MAGs with accuracy and biological relevance are so far limited. Researchers often find that human involvement is necessary to achieve representative binning results. This manual process however is expertise demanding and labor intensive, and it deserves to be supported by software infrastructure. ResultsWe present BinaRena, a comprehensive and versatile graphic interface dedicated to aiding human operators to explore metagenome assemblies via customizable visualization and to associate contigs with bins. Contigs are rendered as an interactive scatter plot based on various data types, including sequence metrics, coverage profiles, taxonomic assignments, and functional annotations. Various contig-level operations are permitted, such as selection, masking, highlighting, focusing, and searching. Binning plans can be conveniently edited, inspected, and compared visually or using metrics including silhouette coefficient and adjusted Rand index. Completeness and contamination of user-selected contigs can be calculated in real time.In demonstration of BinaRena’s usability, we show that it facilitated biological pattern discovery, hypothesis generation, and bin refinement in a complex tropical peatland metagenome. It enabled isolation of pathogenic genomes within closely related populations from the gut microbiota of diarrheal human subjects. It significantly improved overall binning quality after curating results of automated binners using a simulated marine dataset. ConclusionsBinaRena is an installation-free, dependency-free, client-end web application that operates directly in any modern web browser, facilitating ease of deployment and accessibility for researchers of all skill levels. The program is hosted athttps://github.com/qiyunlab/binarena, together with documentation, tutorials, example data, and a live demo. It effectively supports human researchers in intuitive interpretation and fine tuning of metagenomic data.
more » « less
DEPP: Deep Learning Enables Extending Species Trees using Single Genes

https://doi.org/10.1093/sysbio/syac031

Jiang, Yueyu; Balaban, Metin; Zhu, Qiyun; Mirarab, Siavash; Solis-Lemus, ed., Claudia (April 2022, Systematic Biology)

Abstract Placing new sequences onto reference phylogenies is increasingly used for analyzing environmental samples, especially microbiomes. Existing placement methods assume that query sequences have evolved under specific models directly on the reference phylogeny. For example, they assume single-gene data (e.g., 16S rRNA amplicons) have evolved under the GTR model on a gene tree. Placement, however, often has a more ambitious goal: extending a (genome-wide) species tree given data from individual genes without knowing the evolutionary model. Addressing this challenging problem requires new directions. Here, we introduce Deep-learning Enabled Phylogenetic Placement (DEPP), an algorithm that learns to extend species trees using single genes without prespecified models. In simulations and on real data, we show that DEPP can match the accuracy of model-based methods without any prior knowledge of the model. We also show that DEPP can update the multilocus microbial tree-of-life with single genes with high accuracy. We further demonstrate that DEPP can combine 16S and metagenomic data onto a single tree, enabling community structure analyses that take advantage of both sources of data. [Deep learning; gene tree discordance; metagenomics; microbiome analyses; neural networks; phylogenetic placement.]
more » « less
Fast and accurate distance‐based phylogenetic placement using divide and conquer

https://doi.org/10.1111/1755-0998.13527

Balaban, Metin; Jiang, Yueyu; Roush, Daniel; Zhu, Qiyun; Mirarab, Siavash (January 2021, Molecular Ecology Resources)

Full Text Available
Greengenes2 unifies microbial data in a single reference tree

https://doi.org/10.1038/s41587-023-01845-1

McDonald, Daniel; Jiang, Yueyu; Balaban, Metin; Cantrell, Kalen; Zhu, Qiyun; Gonzalez, Antonio; Morton, James T.; Nicolaou, Giorgia; Parks, Donovan H.; Karst, Søren M.; et al (July 2023, Nature Biotechnology)

Abstract Studies using 16S rRNA and shotgun metagenomics typically yield different results, usually attributed to PCR amplification biases. We introduce Greengenes2, a reference tree that unifies genomic and 16S rRNA databases in a consistent, integrated resource. By inserting sequences into a whole-genome phylogeny, we show that 16S rRNA and shotgun metagenomic data generated from the same samples agree in principal coordinates space, taxonomy and phenotype effect size when analyzed with the same tree.
more » « less
Swapping Metagenomics Preprocessing Pipeline Components Offers Speed and Sensitivity Increases

https://doi.org/10.1128/msystems.01378-21

Armstrong, George; Martino, Cameron; Morris, Justin; Khaleghi, Behnam; Kang, Jaeyoung; DeReus, Jeff; Zhu, Qiyun; Roush, Daniel; McDonald, Daniel; Gonazlez, Antonio; et al (April 2022, mSystems)
Mackelprang, Rachel (Ed.)
ABSTRACT Increasing data volumes on high-throughput sequencing instruments such as the NovaSeq 6000 leads to long computational bottlenecks for common metagenomics data preprocessing tasks such as adaptor and primer trimming and host removal. Here, we test whether faster recently developed computational tools (Fastp and Minimap2) can replace widely used choices (Atropos and Bowtie2), obtaining dramatic accelerations with additional sensitivity and minimal loss of specificity for these tasks. Furthermore, the taxonomic tables resulting from downstream processing provide biologically comparable results. However, we demonstrate that for taxonomic assignment, Bowtie2’s specificity is still required. We suggest that periodic reevaluation of pipeline components, together with improvements to standardized APIs to chain them together, will greatly enhance the efficiency of common bioinformatics tasks while also facilitating incorporation of further optimized steps running on GPUs, FPGAs, or other architectures. We also note that a detailed exploration of available algorithms and pipeline components is an important step that should be taken before optimization of less efficient algorithms on advanced or nonstandard hardware. IMPORTANCE In shotgun metagenomics studies that seek to relate changes in microbial DNA across samples, processing the data on a computer often takes longer than obtaining the data from the sequencing instrument. Recently developed software packages that perform individual steps in the pipeline of data processing in principle offer speed advantages, but in practice they may contain pitfalls that prevent their use, for example, they may make approximations that introduce unacceptable errors in the data. Here, we show that differences in choices of these components can speed up overall data processing by 5-fold or more on the same hardware while maintaining a high degree of correctness, greatly reducing the time taken to interpret results. This is an important step for using the data in clinical settings, where the time taken to obtain the results may be critical for guiding treatment.
more » « less
Full Text Available
A gut-derived metabolite alters brain activity and anxiety behaviour in mice

https://doi.org/10.1038/s41586-022-04396-8

Needham, Brittany D.; Funabashi, Masanori; Adame, Mark D.; Wang, Zhuo; Boktor, Joseph C.; Haney, Jillian; Wu, Wei-Li; Rabut, Claire; Ladinsky, Mark S.; Hwang, Son-Jong; et al (February 2022, Nature)

Full Text Available
Compositional and genetic alterations in Graves’ disease gut microbiome reveal specific diagnostic biomarkers

https://doi.org/10.1038/s41396-021-01016-7

Zhu, Qiyun; Hou, Qiangchuan; Huang, Shi; Ou, Qianying; Huo, Dongxue; Vázquez-Baeza, Yoshiki; Cen, Chaoping; Cantu, Victor; Estaki, Mehrbod; Chang, Haibo; et al (June 2021, The ISME Journal)

Abstract Graves’ Disease is the most common organ-specific autoimmune disease and has been linked in small pilot studies to taxonomic markers within the gut microbiome. Important limitations of this work include small sample sizes and low-resolution taxonomic markers. Accordingly, we studied 162 gut microbiomes of mild and severe Graves’ disease (GD) patients and healthy controls. Taxonomic and functional analyses based on metagenome-assembled genomes (MAGs) and MAG-annotated genes, together with predicted metabolic functions and metabolite profiles, revealed a well-defined network of MAGs, genes and clinical indexes separating healthy from GD subjects. A supervised classification model identified a combination of biomarkers including microbial species, MAGs, genes and SNPs, with predictive power superior to models from any single biomarker type (AUC = 0.98). Global, cross-disease multi-cohort analysis of gut microbiomes revealed high specificity of these GD biomarkers, notably discriminating against Parkinson’s Disease, and suggesting that non-invasive stool-based diagnostics will be useful for these diseases.
more » « less
Optimizing sequencing protocols for leaderboard metagenomics by combining long and short reads

https://doi.org/10.1186/s13059-019-1834-9

Sanders, Jon G.; Nurk, Sergey; Salido, Rodolfo A.; Minich, Jeremiah; Xu, Zhenjiang Z.; Zhu, Qiyun; Martino, Cameron; Fedarko, Marcus; Arthur, Timothy D.; Chen, Feng; et al (December 2019, Genome Biology)

Full Text Available

« Prev Next »

Search for: All records